-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zip64 #211
Add zip64 #211
Conversation
So if I understand, there's no post data descriptor for zip64 but the central directory has the values? Then you can't use streaming reads on zip64? This pretty shitty. I guess they updated APPNOTE.txt:
So there's currently no way to forward-only read entries out of a zip64. Would a good test to just be forcing zip64 on a small file? The values getting overloaded shouldn't really matter. |
StreamingYes, that is my understanding: streaming cannot write 64bit values in the post-data. I know they previously said that the post-data header should have 8-byte values for zip64, but they forgot to mention how to figure out if the file is zip64 when reading forward-only. The update you mention is probably a thought that there should be some kind of identifier in the post-data region. You can still use forward-only reading, but you cannot verify if the lengths are correct, if the stream is larger than 4GiB. Edit: This problem is worst if you write the archive forward-only AND read the archive in forward-only. If the stream is seekable, it is possible to forgo the post-data and use the zip64 extras, such that forward-only reading is possible. TestingYou can force zip64 to have the extra headers, but I would expect errors to show up when values are hitting the MiscOne detail I forgot to mention in the initial text is that the list of files report the |
You could always just say upfront to write everything as zip64 and deal with streaming that way. Then write post descriptors with 8 bytes. I guess that would be incompatible with other implementations? I guess if the code allows forward-only reading but you'd have to read the central directory for the valid length that's something. However, that's now how the reader works now. The central directory isn't used with readers, only archives. Ugh. I guess the count should be 'ulong' like everything else. This code really needs some refactor lovin too I think. |
StreamingI guess we can look at "what others do", but that is likely to cause errors. If the readers expect 4-byte values, there should at least be some indicator somewhere that the values are not 4-byte. Count overflowYes, but that would mean replacing the system |
I usually test with WinRAR and 7Zip myself. I guess I see the point about the collections. Let's save that for a rainy day. |
Maybe a better approach is to simply throw an exception if that case is hit? If the stream is non-seekable, and the input data is larger than |
Yes. Explicit exceptions are good if there's no fix. At least to let people know's happening. Thanks again! |
Should we do the same for the case where we use a seek-able stream, and do not have space for the zip64 extra fields? Maybe we could then choose to not make space for it by default (remove my 2GiB threshold) and make the caller set the zip64 flag when creating the archive or stream exceeds 4GiB ? |
I would rather be explicit in these cases and force the user to know what they're doing. Especially when getting into the zip64 file size area. I guess just measure then throw exceptions if it's not zip64 mode. |
Ok, I will update the code to require explicit zip64 activation when writing streams larger than 4GiB. And disallow writing streams larger than 4GiB in forward-only mode. Archives larger than 4GiB can be transparently supported, as long as all streams inside are less than 4GiB individually. |
…ngth (12 bytes) are not included in the reported length.
…ve limits, and zip64 is not enabled. This changes the logic, such that archives larger than 4GiB are still automatically written correct (only the central header is special). Archives with individual streams larger than 4 GiB must set the zip64 flag, either on the archive or the individual streams.
…be read in both Archive and Stream mode
I have updated the logic to require setting the zip64 flag before writing a stream larger than 4GiB. I have added the check to the Not sure how you prefer this to work, but we cannot know in advance if the write expands the file beyond 4GiB, so I have made a pre-check and a post-check. If we catch it in the pre-check, the file is correct (but with a shorter stream), if we catch it in the post check, invalid data is now in the archive. I have also added a unittest for trying out various parts of the zip64 features. |
It looks like you throw an exception when the file produced will be invalid so that seems sufficient. |
This adds zip64 writing support.
The way zip64 is implemented is by appending a set of "extra" values to the header.
In the zip file there are two headers for each file: local (before the stream) and central (end of file).
The central header is simple enough, and most implementations simply use this and mostly ignore the other header. Once we are writing the central header, we have all the information required, so we can just write it.
For the local header, there is a tradeoff. The "extra" bytes take up 2+2+8+8=20 bytes pr. entry. This header is only required if either stream size (compressed and uncompressed) exceeds
uint.MaxValue
, but we do not know the length of the stream before writing.The dilemma is: do we write it for all files, in case one is too long? Or do we not write it and risk overflowing the values?
Since the header is "mostly ignored" we could live with this being broken. On the other hand, if we can use it we should.
I have added a hint value to the
ZipWriteEntryOptions
that can force this header on and off. I have also added a flag inZipWriterOptions
that enable the extra fields for all entries by default. If the caller does not set any flags, as I would assume most would, I use a threshold of2GiB
to toggle zip64 headers. If the underlying stream is already 2GiB or more, zip64 is automatically enabled in the local header.This is not a perfect solution, but the I figure most users would write smaller zip files and never notices. Larger zip files are not really impacted by 20 bytes here and there. You can of course defeat the scheme by writing a 1.9GiB file, and then a +4GiB file, thus hitting the limitations before the automatic upgrade kicks in.
If the stream is non-seekable, we have another issue, namely that the file would normally set a flag and then write the Crc, uncompressed size, and compressed size in a special trailing header. This trailing header has not been updated to use zip64, so we cannot write the correct values in it. We can also not use both trailing headers and "extra" data. This was clarified from PKWare: https://blogs.oracle.com/xuemingshen/entry/is_zipinput_outputstream_handling_of
In the case of streaming, the local headers are written with the trailing data, which may overflow. But the headers do contain the crc32 value, and may contain correct data if the sizes are less than
uint.MaxValue
. Again, the central directory header has the correct values.Not sure how to deal with testing, as it requires files +4GiB to hit the limitations.